AITopics | data diversification

Collaborating Authors

data diversification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix for Data Diversification: A Simple Strategy For Neural Machine Translation Xuan-Phi Nguyen

Neural Information Processing SystemsFeb-8-2026, 21:46:07 GMT

Finally, we describe the training setup for our back-translation experiments. We continue to differentiate our method from other existing works. Our method does not train multiple peer models with EM training either. In each round, a forward (or backward) model takes turn to play the "back-translation" role to train The role is switched in the next round. In other words, source and target are identical.

artificial intelligence, experiment, natural language, (16 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > Canada (0.04)
Europe > Germany > Berlin (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Data Diversification: A Simple Strategy For Neural Machine Translation

Neural Information Processing SystemsDec-24-2025, 04:21:45 GMT

We introduce Data Diversification: a simple but effective strategy to boost neural machine translation (NMT) performance. It diversifies the training data by using the predictions of multiple forward and backward models and then merging them with the original dataset on which the final NMT model is trained. Our method is applicable to all NMT models. It does not require extra monolingual data like back-translation, nor does it add more computations and parameters like ensembles of models. Our method achieves state-of-the-art BLEU scores of 30.7 and 43.7 in the WMT'14 English-German and English-French translation tasks, respectively. It also substantially improves on 8 other translation tasks: 4 IWSLT tasks (English-German and English-French) and 4 low-resource translation tasks (English-Nepali and English-Sinhala). We demonstrate that our method is more effective than knowledge distillation and dual learning, it exhibits strong correlation with ensembles of models, and it trades perplexity off for better BLEU score.

data diversification, neural machine translation, simple strategy, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Appendix for Data Diversification: A Simple Strategy For Neural Machine Translation Xuan-Phi Nguyen

Neural Information Processing SystemsOct-3-2025, 05:38:27 GMT

artificial intelligence, experiment, natural language, (16 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > Canada (0.04)
Europe > Germany > Berlin (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Data Diversification: A Simple Strategy For Neural Machine Translation Xuan-Phi Nguyen

Neural Information Processing SystemsOct-3-2025, 05:38:20 GMT

Our method is applicable to all NMT models.

artificial intelligence, computational linguistic, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Review for NeurIPS paper: Data Diversification: A Simple Strategy For Neural Machine Translation

Neural Information Processing SystemsJan-25-2025, 17:00:21 GMT

Weaknesses: While the described approach is simple and very generally applicable, there are some major issues with the evaluation that need to be addressed. If 1. and 2. are addressed I would be willing to update my scores. The BLEU evaluation is not clearly described for the WMT and IWSLT experiments. Given the major variations observed in BLEU scores due to differences in post-processing or the BLEU evaluation script used, it's hard to fairly compare against previous work without clearly describing the post-processing, tokenization and BLEU evaluation tool used for these experiments. Since the proposed method relies heavily on using backward and forward translated data, these effects are bound to affect the observed BLEU improvements.

data diversification, neural machine translation, simple strategy, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Review for NeurIPS paper: Data Diversification: A Simple Strategy For Neural Machine Translation

Neural Information Processing SystemsJan-25-2025, 17:00:14 GMT

This work describes a simple approach to synthetically augment the training dataset for neural machine translation. The proposed approach involves training multiple forward and backward MT models and appending their outputs on the original training dataset to the training data. This augmented (or diversified) training dataset can then be used to train the next generation of models. The proposed approach is simple, achieves good results, and the authors do a good job presenting the idea. The paper is quite empirical and the technique fairly specific to NMT, but it is still interesting to see that sometimes simple ideas work well and are thus important / deserve careful consideration.

data diversification, neural machine translation, training dataset, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Data Diversification: A Simple Strategy For Neural Machine Translation

Neural Information Processing SystemsOct-10-2024, 12:37:24 GMT

data diversification, neural machine translation, translation task, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training

Xin, Chen, Hartel, Andreas, Kasneci, Enkelejda

arXiv.org Artificial IntelligenceJul-12-2024

Swift and accurate detection of specified objects is crucial for many industrial applications, such as safety monitoring on construction sites. However, traditional approaches rely heavily on arduous manual annotation and data collection, which struggle to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an automated end-to-end pipeline designed to streamline the entire workflow of an object detection application from data collection to model deployment. DART eliminates the need for human labeling and extensive data collection while excelling in diverse scenarios. It employs a subject-driven image generation module (DreamBooth with SDXL) for data diversification, followed by an annotation stage where open-vocabulary object detection (Grounding DINO) generates bounding box annotations for both generated and original images. These pseudo-labels are then reviewed by a large multimodal model (GPT-4o) to guarantee credibility before serving as ground truth to train real-time object detectors (YOLO). We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current implementation of DART significantly increases average precision (AP) from 0.064 to 0.832. Furthermore, we adopt a modular design for DART to ensure easy exchangeability and extensibility. This allows for a smooth transition to more advanced algorithms in the future, seamless integration of new object categories without manual labeling, and adaptability to customized environments without extra data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.

arxiv, photorealistic image, placeholder, (13 more...)

arXiv.org Artificial Intelligence

2407.09174

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Materials > Metals & Mining (1.00)
Machinery > Construction Machinery & Heavy Trucks (1.00)
Energy (1.00)
Construction & Engineering (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

KIT's Multilingual Speech Translation System for IWSLT 2023

Liu, Danni, Nguyen, Thai Binh, Koneru, Sai, Ugan, Enes Yavuz, Pham, Ngoc-Quan, Nguyen, Tuan-Nam, Dinh, Tu Anh, Mullov, Carlos, Waibel, Alexander, Niehues, Jan

arXiv.org Artificial IntelligenceJul-12-2023

Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which evaluates translation quality on scientific conference talks. The test condition features accented input speech and terminology-dense contents. The task requires translation into 10 languages of varying amounts of resources. In absence of training data from the target domain, we use a retrieval-based approach (kNN-MT) for effective adaptation (+0.8 BLEU for speech translation). We also use adapters to easily integrate incremental training data from data augmentation, and show that it matches the performance of re-training. We observe that cascaded systems are more easily adaptable towards specific target domains, due to their separate modules. Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation, although their performance remains similar on TED talks.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2306.0532

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
(17 more...)

Genre: Research Report (0.82)

Industry: Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback